{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "[![AWS SDK for pandas](_static/logo.png \"AWS SDK for pandas\")](https://github.com/aws/aws-sdk-pandas)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "# 40 - EMR Serverless"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "Amazon EMR Serverless is a new deployment option for Amazon EMR. EMR Serverless provides a serverless runtime environment that simplifies the operation of analytics applications that use the latest open source frameworks, such as Apache Spark and Apache Hive. With EMR Serverless, you don’t have to configure, optimize, secure, or operate clusters to run applications with these frameworks.\n",
    "More in [User Guide](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/emr-serverless.html)."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Spark"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Create a Spark application"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/_n/7dm3ff5d5fb01gjt6ms150km0000gs/T/ipykernel_11468/3968622978.py:3: SDKPandasExperimentalWarning: `create_application`: This API is experimental and may change in future AWS SDK for Pandas releases. \n",
      "  spark_application_id: str = wr.emr_serverless.create_application(\n"
     ]
    }
   ],
   "source": [
    "import awswrangler as wr\n",
    "\n",
    "spark_application_id: str = wr.emr_serverless.create_application(\n",
    "    name=\"my-spark-application\",\n",
    "    application_type=\"Spark\",\n",
    "    release_label=\"emr-6.10.0\",\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Run a Spark job"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "iam_role_arn = \"arn:aws:iam::...:role/...\"\n",
    "\n",
    "wr.emr_serverless.run_job(\n",
    "    application_id=spark_application_id,\n",
    "    execution_role_arn=iam_role_arn,\n",
    "    job_driver_args={\n",
    "        \"entryPoint\": \"/usr/lib/spark/examples/jars/spark-examples.jar\",\n",
    "        \"entryPointArguments\": [\"1\"],\n",
    "        \"sparkSubmitParameters\": \"--class org.apache.spark.examples.SparkPi --conf spark.executor.cores=4 --conf spark.executor.memory=20g --conf spark.driver.cores=4 --conf spark.driver.memory=8g --conf spark.executor.instances=1\",\n",
    "    },\n",
    "    job_type=\"Spark\",\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Hive"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Create a Hive application"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/_n/7dm3ff5d5fb01gjt6ms150km0000gs/T/ipykernel_11468/3826130602.py:1: SDKPandasExperimentalWarning: `create_application`: This API is experimental and may change in future AWS SDK for Pandas releases. \n",
      "  hive_application_id: str = wr.emr_serverless.create_application(\n"
     ]
    }
   ],
   "source": [
    "hive_application_id: str = wr.emr_serverless.create_application(\n",
    "    name=\"my-hive-application\",\n",
    "    application_type=\"Hive\",\n",
    "    release_label=\"emr-6.10.0\",\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Run a Hive job"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "path = \"s3://my-bucket/path\"\n",
    "\n",
    "wr.emr_serverless.run_job(\n",
    "    application_id=hive_application_id,\n",
    "    execution_role_arn=\"arn:aws:iam::...:role/...\",\n",
    "    job_driver_args={\n",
    "        \"query\": f\"{path}/hive-query.ql\",\n",
    "        \"parameters\": f\"--hiveconf hive.exec.scratchdir={path}/scratch --hiveconf hive.metastore.warehouse.dir={path}/warehouse\",\n",
    "    },\n",
    "    job_type=\"Hive\",\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.9.14",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
